memory_alphafandomcom-20200223-history
Forum:EDA Sidebar species
__TOC__ Introduction This is a break down of the topic Forum:Exploratory Data Analysis for the . Data source as of 2015-11-16, dowloaded with dumpgenerator.py from https://github.com/WikiTeam/wikiteam Comments * There are 152 species' sidebars * date has only six entries (2374 | 2154 | 2364 | 24th century) * name has only five entries (Parada/Paradan | Quarren | Terrellian | Ventu (Ledosians)), including one repetition of Terrellian * planet has 72 entries and 78 NA plus 2 empty values * planet2 has only one entry (Voth city ship) from Voth * pop has only five entries (A few thousand | Extinct | Nearing extinction) * quadrant has 119 entries and 29 NA. The categories are: Alpha Quadrant | Beta Quadrant | Delta Quadrant | Gamma Quadrant | Gamma Quadrant;Alpha Quadrant * quadrant is Gamma Quadrant;Alpha Quadrant for Jem'Hadar * quadrant2 has only one entry (Delta Quadrant for Voth) * type has five NA (Swarm species | Borg) * type categories are: (Humanoid | Non-corporeal species | Non-humanoid | Reptile | Saurian | Shapeshifter | Unknown) * type is Saurian for Voth and is Unknown for Ba'neth Variables found * "date" * "image" * "image2" * "image3" * "imagealt" * "imagealt2" * "imagecap" * "imagecap2" * "imagecap3" * "logo" * "logoalt" * "name" * "planet" * "planet2" * "pop" * "quadrant" * "quadrant2" * "type" Data On the table below, the columns are: ; Variable : name of the tag found in the sidebar, regardless if it is in the template or not. ; Not listed : Number of times the Variable was not used in the sidebar. ; Empty : Number of times the Variable was used and its content is empty. ; Has Content : Number of times the Variable was used and has some content. ; First 5 : First five occurrences of the Variable found in the database. ; First 5 Categories : First five occurrences of the classifications used by the variable. Discussion A couple of odd points that come from your notes above. There should have been far more than 42 of these. I counted almost 150 as I was converting them all today. That raises a second point -- they've all been converted to a new format. Ha. A: 1) As mentioned, I am using the last available data dump from 2015.07.18. So likely we are not talking over the same data. I am happy to re-run the scripts (available at Forum:EDA Scripts) if a fresh dataset is made available. 2) I found some "*/temp" entries that I was not checking before (the "/" was not accounted for). 3) The regular expression looks for "{ {sidebar" up to "\n}}". If the link end is only "}}" then it would get either until the whole end of the text, or cut short by some links inside the sidebar itself. In the original, "Species" was the valid variable, though I updated that to "species" previously, and it's now been updated to "name". It should only be used where there is a desire to show a different title on the sidebar than the name of the page itself. A: As I haven't dealt with the data before, and lack historical context, I am just touching the data as it is (or was on 2015.07.18) and reporting on that comes out of it. Some of the variables listed above in your original data query were actually invalid variables (including "image7" and its offshoots). -- sulfur (talk) 19:13, September 24, 2015 (UTC) A: To be clear, as I downloaded the data for some personal use, once I saw some, say issues, I though it could helpful to report them so the community could address them. The intention was to help, not criticise. :) -- DataScientist (talk) 09:01, November 16, 2015 (UTC)